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Abstract 

Background: GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important 
and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and 
regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of 
the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous 
members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene 
family were previously characterised, and the majority of the members remain unknown. In the present study, we 
describe the rice OsGELP {Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic 
levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. 

Results: In this study, an extensive bioinformatics analysis identified 1 14 genes in the rice OsGELP gene family. A 
complete overview of this family in rice is presented, including the chromosome locations, gene structures, 
phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known 
functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in 
different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common 
and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly 
signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance 
with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential 
expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, 
together with protein motif architectures, and the expression profiling were analysed to predict the possible 
biological functions of the rice OsGELP genes. 

Conclusions: Our current genomic analysis, for the first time, presents fundamental information on the 
organization of the rice OsGELP gene family. With combination of the genomic, phylogenetic, microarray 
expression, protein motif distribution, and protein structure analyses, we were able to create supported basis for the 
functional prediction of many members in the rice GDSL esterase/lipase family. The present study provides a 
platform for the selection of candidate genes for further detailed functional study. 
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Background 

The GDSL motif enzyme is a relatively newly discovered 
lipase, with many characteristics that have not yet been 
fully, clearly, and precisely described [1,2]. Since 1995, 
when Upton and Buckley first reported the new GDS[L]- 
motif-like subfamily of lipases (pfam PF00657), new 
questions have arisen about the specific functions of 
these fascinating lipolytic enzymes. 

The number of lipases (EC 3.1.1.3) and esterases (EC 
3.1.1.1) that have been studied tremendously increased 
over the last decades. The lipase and esterase families be- 
long to hydrolases— a class of enzymes that shows very 
broad substrate specificity. All enzymes in these families 
contained a catalytic triad composed of serine (Ser), aspar- 
tic (or glutamic), and histidine (His) residues. The role of 
the nucleophile in lipases is played by a Ser residue, which 
is a part of the highly conserved motif Gly-X-Ser-X-Gly (X 
being any amino acid), positioned in the middle of the 
amino acid sequence. In contrast, enzymes that belong to 
the GDSL family of esterases/lipases share five blocks of 
highly conserved homology, which are important for their 
classification. The active-site Ser is located close to the N- 
terminus. The GDSL family is further classified as SGNH 
hydrolase because of the presence of the strictly conserved 
residues Ser-Gly-Asn-His in the conserved blocks I, II, III, 
and V [1-3]. Two other proton donors to the oxidation 
hole are the glycine (Gly) residue in block II and the as- 
paragine (Asn) in block III. The His amino acid in block V 
serves as a base that makes the Ser in block I more nu- 
cleophilic by deprotonating the hydroxyl group. Add- 
itional characteristic for block V is the presence of 
aspartate (Asp) three amino acids ahead of His (i.e., DxxH 
sustain as the third member of the catalytic triad). Unlike 
other lipases, GDSL hydrolases have a flexible active site 
and they change conformation in the presence of different 
substrates; hence, some GDSL enzymes have broadly di- 
verse enzymatic activities, including esterase and protease 
activity in the same enzyme [4,5]. 

The GDSL esterases/lipases are found throughout all 
kingdoms of life. Due to their broad substrate specificity, 
these highly promising enzymes can be potentially used 
for biotechnological application in a wide range of indus- 
tries (e.g. food, fragrance, cosmetics, textile, pharmaceut- 
ical, and detergent industry) [3]. They have been 
previously identified in a wide range of organisms, and 
several GDSL Ser esterases/lipases have been cloned and 
characterized. Many GDSL esterases/lipases have been 
found in bacteria, and advancement has been made to- 
ward uncovering their structures, functions, and physio- 
logic roles [6-20]. The enzymes of GDSL esterases/lipases 
have been cloned and characterized, and at present, the 
crystal structures from Streptomyces scabies, Escherichia 
coli, Pseudomonas fluorescens, Mycobacterium smegmatis, 
and Pseudomonas aeruginosa are available [21-28]. Their 



mature enzymes display expansive hydrolytic activity with 
different types of substrates, including acyl-CoAs, a variety 
of esters, and amino acid derivatives. 

All the structures of the GDSL esterase/lipase that 
have been described to date belong to the a/|3 hydrolase 
fold superfamily of proteins. The main difference in fold- 
ing from classical al (3 hydrolase fold is a distinct location 
of the residues involved in active site formation, which 
direct to a different analogous orientation of the catalytic 
triad with regard to the central parallel p-sheet [4,25]. 
Recently, the structure of the GDSL esterase/lipase pro- 
teins from several species of bacteria has been deter- 
mined [21,23,25-28], but no structure from plants has 
been resolved yet. 

The GDSL esterases/lipases have been also found in 
plant species and have become very attractive subjects 
because of their newly discovered properties and func- 
tions. Recently, in the plant kingdom, the novel family of 
the GDSL esterases/lipases is represented by more than 
1100 members from the twelve different fully sequenced 
plant genomes. It was reported that GDSL family from 
Arabidopsis thaliana consists of 108 members [29], and 
Vitis vinifera, Sorghum bicolour, Populus trichocarpa, 
and Physcomitrella patens contain 96, 130, 126 and 57 
members, respectively [30]. Search across multiple data- 
bases revealed 114 members from Oryza sativa, 53 
members from Zea mays, 90 members from Selaginella 
moellendorffii, 88 members from Medicago truncatula, 
102 members from Chlamydomonas reinhardtii, 59 
members from Ostreococcus tauri, and 75 members 
from Phaeodactylum tricornutum [31,32]. Several plant 
GDSL esterases/lipases have been isolated, cloned, and 
characterized. Physiologically, the GDSL esterases/ 
lipases that have been described so far are mainly 
involved in the regulation of plant development, mor- 
phogenesis, synthesis of secondary metabolites, and de- 
fence response [33-55]. 

Rice has become a model plant for genomic research 
of monocotyledonous species because of its small gen- 
omic size and economic importance, but our knowledge 
of the GDSL esterases/lipases gene family in rice is ra- 
ther limited. Although there are more than 100 mem- 
bers of the GDSL esterase/lipase family in the rice 
genome, only a few GDSL esterases/lipases genes have 
been studied and the functions and properties of the ma- 
jority of members remain unknown. Currently, only two 
rice GDSL esterases/lipases genes have been reported. 
GDSL-containing enzyme rice 1 (GER1) and wilted dwarf 
and lethal 1 (WDL1) were cloned from the rice genome, 
and their physiologic functions were suggested as regula- 
tory in coleoptile elongation and plant growth in the 
seedling stage, respectively [56,57]. 

In the present study, 114 OsGELP genes were identi- 
fied in rice. This is the first bioinformatics genome-wide 



Chepyshko et al. BMC Genomics 2012, 13:309 
http://www.biomedcentral.com/1471 -21 64/1 3/309 



Page 3 of 19 



survey of the OsGELP gene family with description of: 
the genomic distribution, gene structure of the OsGELP 
genes, phylogenetic analysis, as well as motif analysis, 
and structure modelling for the OsGELP proteins. More 
than 30 additional, clade-common and -specific peptide 
motifs outside the GDSL domain were uncovered, 
described, and their putative functionality based on the 
GDSL-lipase protein tertiary structure was proposed. Po- 
tentially important regions for substrate specificity and 
binding, as well as functional grouping according to the 
phylogenetic relations are discussed. The expression pat- 
terns of some representative genes analysed by quantita- 
tive real-time PCR in response to cytokinin hormone 
treatment matched with the digital expression results. 
The results of the microarray expression profiling under 
the different treatment conditions, and the phylogenetic 
relatedness of the genes were analyzed in order to pre- 
dict their functions in rice. 

Considering the fact that a very limited number of the 
OsGELP genes have been characterized to date, results 
reported in this study is the first step towards the under- 
standing of the roles of the GDSL esterases/lipases in rice, 
which provide a solid foundation for function predictions 
of possible roles of the GDSL enzymes in rice. Our work 
introduces a fundamental framework for selection of ap- 
propriate candidate genes for the subsequent functional 
analysis of the OsGELP family members. 

Results 

Identification of the GDSL esterase/lipase family genes in 
rice 

A total of 114 putative OsGELP genes were identified and 
designated as OsGELP 1 to OsGELP 114 based on their 
order and position in corresponding chromosomes 1-12 
from top to bottom. Their gene name, locus ID, the acces- 
sion numbers for coding sequences (CDSs), genomic 
DNA, cDNA, and predicted isoelectric points of all the 
114 OsGELP genes are listed in Additional file 1. The open 
reading frame (ORF) sizes of the OsGELP genes vary from 
570 bp (OsGELP76) to 1,362 bp (OsGELP30), with an 
average sequence length of 1,097 bp. 

Most of the OsGELP genes are expressed in various 
organs. Ninety nine genes have one or more full-length 
cDNA (FL-cDNA) and/or expressed sequence tags 
(ESTs) (Additional file 2). The expression of 13 other 
genes were confirmed by microarray data available at 
Genevestigator [58], and two (OsGELP9 and 13) genes 
had only MPSS data support (Figure 1). The number of 
mapped EST sequences for the OsGELP genes was quite 
variable, indicating marginal 1-3 (e.g., OsGELPll, 34, 
52, 68, 82, 89, and 102) to strong 100 to >200 (for 
OsGELP3, 6, 53, 63, 77, 79, and 85) expression (Add- 
itional file 2). 



Up to 24.5% (28 of 114) of the OsGELP genes were pre- 
dicted to be alternatively spliced by the Rice Genome An- 
notation Project (RGAP) database (release 6.1). The 
OsGELP genes are present in two to four alternatively 
spliced forms, giving rise to a total of 68 transcripts (Add- 
itional file 1). This number is slightly higher than that pre- 
dicted for rice genes overall [59]. The expression of 33 of 
the 68 transcripts was confirmed by FL-cDNA evidence 
(Figure 1, Additional file 2). Several annotation errors 
were observed in the automated annotation of the rice 
genome, including intron/exon numbers/positions that 
were corrected according to the rice FL-cDNA sequences 
from the Knowledge-based Oryza Molecular Biological 
Encyclopedia database (KOME) [60]. For example, the an- 
notation of two OsGELP (OsGELP79 and 113) genes were 
corrected. Their structure annotations were changed from 
2 exon/1 intron into 3 exon/2 intron, and 4 exon/3 intron 
to 5 exon/4 intron patterns. Also, the predicted ORF sizes 
were modified according to the availability of FL-cDNA 
(AK066113 and AK063071), from 1,107 and 1,272 bp to 
1,026 and 846 bp, respectively. 

Chromosomal distribution, gene structure and 
evolutionary expansion of the OsGELP genes 

Figure 2 is a diagrammatic representation of the 
chromosomal distribution and direction of transcription 
of the OsGELP genes in 12 rice chromosomes. As shown 
in Figure 2, the OsGELP genes are present in every 
chromosome, but their distribution is not homogeneous 
and uniform. For example, the highest number (24.6%) 
of the OsGELP was observed in chromosome 1, with a 
relatively high density of the OsGELP genes in some 
chromosomal regions (Figure 2). Also, a high number of 
genes are condensed on chromosomes 2, 6 (14.9% on 
each), and 5 (12.3%), whereas rice chromosomes 8 and 
10 contain only two OsGELP gene loci each. Up to 
46.5% OsGELP genes are located closely in chromo- 
somes. These 54 OsGELP genes comprise 17 clusters, in 
which closely linked genes are adjacent or separated by 
1 or not more than 4 unrelated genes (Figure 2, Add- 
itional file 3). Interestingly, the genes that interrupt the 
OsGELP gene clusters encode mostly small-sized hypo- 
thetical or expressed proteins and large retrotransposon/ 
transposon proteins. A total of seven clusters (I, II, IV, 
VI, IX, XI, and XIV), located in chromosomes 1, 2, 3, 5, 
and 6, contain a large number of transposable element 
(TE)-related genes inserted between 26 OsGELP genes. 
To understand the mechanisms underlying the evolution 
of the OsGELP gene family, both tandem and segmental 
duplication events were examined. A large number 
(19.3%) of the OsGELP genes were observed on dupli- 
cated chromosomal segments of rice (Additional file 4). 
Furthermore, 25 of the 114 OsGELP genes that clustered 
in the same chromosomal regions (Figure 2) comprise 
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(See figure on previous page.) 

Figure 1 The rice OsGELP gene expression anatomy viewer. The expression patterns of 121 transcripts of 1 14 OsGELP genes in different rice 
tissues are shown. The evidence of gene expression for the genes is based on EST, FL-cDNA, MPSS, and Genevestigator data. A positive signal is 
indicated by a coloured box as follows: light blue for seed, light green for shoot, orange for mixed tissue, dirty green for callus, dark blue for 
panicle, light pink for pistil, green for leaf, black for root, red for flower, light yellow for whole plant, dark pink for anther, purple for immature 
seed, blue for endosperm, and lime for seedling. The white box indicates that no expression was observed. The colour in the cDNA column 
designates tissue library from where cDNA support was obtained. The black points display availability of expression data. 



eight groups of tandemly duplicated genes. Notably, we 
determined fifty three outparalogous genes (46.5%) that 
have undergone duplication after the split of eudiocts- 
monocots, but prior to sorghum and rice speciation, 
(Additional file 5) using the phylogenetic study of Volo- 
kita et al. [30]. There is no consensus regarding the 
number of exons and introns in the GDSL gene struc- 
ture. In most cases (49.1%), the OsGELP genes are inter- 
rupted by four introns and consist of five exons within 
their coding regions (Additional file 6), which is consist- 
ent with the global analysis of the gene structure in the 
rice genome [61]. In other cases, the number of introns 



in the ORF varied from 1 to 6, and the OsGELP39 gene 
was found intronless. The pattern with the highest num- 
ber of exons was observed only in the OsGELP 109 gene 
(seven exons and six introns), whereas 4, 27, 16, and 9 
genes held six/five, four/three, three/two, and two/one 
exon/intron patterns, respectively. 

The chromosomal regions where the candidate genes 
reside vary in their size. Their genomic sequence lengths 
range from 1009 to 24,799 bp due to the large introns 
(Additional file 7). The intron sizes of 45.6% of the 
OsGELP genes appear to exceed 1,000 bp. The 
OsGELP21 and OsGELP97 genes contain over 10-fold 



Chr1 Chr 2 Chr 3 Chr 4 Chr 5 Chr 6 Chr 7 Chr8 Chr 9 ChMO Chr11 Chr12 

(28) (17) (10) (4) (14) (17) (4) (2) (5) (8) (3) (2) 




^28A 

Figure 2 Genomic distribution of the OsGELP genes in rice chromosomes. The OsGELP genes are numbered 1-114. The white rectangles on 
the chromosomes (vertical bars) indicate the positions of the centromeres. Chromosome numbers are indicated at the top of each bar, and the 
number in parentheses corresponds to the number of the OsGELP genes present on that chromosome. The OsGELP genes present on duplicated 
chromosomal segments are connected by coloured lines (one colour per chromosome). The tandemly duplicated genes present in the same 
colour box. The roman numerals and vertical black solid lines show the number and specify groups of the closely linked genes identified as 
clusters. The blue and red triangles indicate the upward and downward directions of transcription, respectively. 
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longer introns than the other genes in the family. The 
two huge introns from these genes, 12,861, and 
11,743 bp, are consistent through all alternative splicing 
forms. Within these long introns, a total of 13 and 12 re- 
petitive elements were detected. These elements are 
represented by different types of miniature inverted- 
repeat transposable elements, transposons, and retro- 
transposons. In general, the diverse repetitive sequences, 
from several superclasses with a variety of sizes, were 
discovered within introns, exons, and 5' or 3' untrans- 
lated regions (UTRs) of 71 OsGELP genes (Additional 
file 8). 

Phylogenetic analysis and evolution of the OsGELP genes 

To study the evolutionary relationship of the members 
of the OsGELP gene family, as well the phylogenetic re- 
lationship among the rice OsGELP genes and other plant 
GDSL genes, whose putative functions were elucidated 
recently, the unrooted phylogenetic trees based on the 
multiple sequence alignment of their protein sequences 
were constructed by the neighbour- joining (NJ) method 
and displayed using the Molecular Evolutionary Genetics 
Analysis (MEGA4) program. 

For the rice OsGELP phylogenetic tree, a dataset of 96 
protein sequences containing 13 conserved alignment 
regions were collected, including the special features of 
the GDSL esterase/lipase such as blocks I, II, III, and V. 
Other 18 OsGELP genes contain gap-rich regions. Dur- 
ing evolution, they probably lost some common GDSL 
enzyme blocks, as well as other shared regions. For this 
reason, they were eliminated from further phylogenetic 
analyses (Additional file 9). 

The rice OsGELP gene family was divided into four 
clades in the final unrooted phylogenetic tree construc- 
tion (Figure 3). The result suggests that clades I and IV 
can be further subdivided into 12 subclades (6 per 
clade). The OsGELP genes, that grouped together in the 
subclades conformed their predictional arrangement of 
segmental and tandemic duplication events. OsGELPs 
from 15 of the 17 genomic clusters were verified to have 
close phylogenetic relationships through their high node 
numbers (Figure 3). Up to 62 OsGELP genes comprise 
31 sister pairs. A total of 12 pairs belong to 10 gene 
clusters and 7 pairs are segmentally duplicated genes 
(Figure 3). Each subclade consists of one or more sister 
gene pairs. This suggests the major role of duplication 
events in the expansion of the OsGELP gene family in 
the rice genome. 

Given that orthologs frequently hold an identical func- 
tion [30,62], our second unrooted NJ phylogenetic tree 
combined 96 rice OsGELP genes and 24 plant GDSL 
orthologs or homologs whose putative functions were 
annotated recently (Figure 4, Additional file 10). Accord- 
ing to the phylogenetic analysis, the OsGELP genes and 



their close plant orthologs or homologs were divided 
into three major subfamilies represented by clades I, II, 
and III. In addition, clades I and III each were separated 
into six subclades (Figure 4). Among the plant GDSL 
esterases/lipases whose functions have been determined, 
5 genes (ARAB-1, AtFXGl, maize AChE, CDEF1, and 
AtLTLl) were found as orthologs of the 15 OsGELP 
genes (Additional file 10). Orthologs, as well as the close 
homologous proteins, share more than 40% similarity 
and assemble together in the same subclades of the 
phylogenetic tree. All 12 subclades of the OsGELP tree 
order remained conserved in the newly generated con- 
joint plant GDSL esterase/lipase gene family tree con- 
structed from a total of 120 members (Figure 4). 
Locations of the plant GDSL genes that were chosen for 
our study coincided with the previously reported tree 
topology of the GDSL esterase/lipase gene family in land 
plants (Embryophyta) [30]. 

Of the four clades of the original rice OsGELP phylo- 
genetic tree, a new clade of the plant GDSL genes 
appeared. The emerging clade (II) is well supported by 
the bootstrap value (98%) and consists of six members 
of the GDSL esterase/lipase genes from A. thaliana, 
Brassica rapa, and Carica papaya, which have been 
shown to be correlated with different kinds of biotic 
stress responses, except one CpEst gene (Figure 4) [33- 
38,63]. The specific nature of clade II in the tree can be 
explained by the association of the clade members with 
the myrosinase-glucosinolate system. This system is al- 
most exclusive to the order Capparales, which includes 
the Brassicaceae plants [34]. This fact can account for 
separation of the group of genes in clade II from the 
other clades in the phylogenetic tree, and every member 
shows relatively low similarity (below 35%) to the 
OsGELP genes (Figure 4, Additional file 10). 

Relationship between protein motifs and phylogenetic 
classification 

A total of 45 motifs with statistical significance (E-value) 
from 1.3e-966 to 9.1e-002 were found among the 
OsGELPs and the known plant GDSL esterase/lipase 
proteins (Additional file 11). Motifs 3, 5, 6, and 2 repre- 
sent GDSL esterase/lipase conserved blocks I, II, III, and 
V, respectively (Figure 4, Additional file 11). As 
expected, the presence of the common GDSL domain 
proteins, represented by the four blocks, affirms its 
major functional role. Other well-conserved motifs out- 
side the GDSL domain were also detected. Significantly, 
12 conserved motifs (1-12 with E-values around e-100) 
with more than 10 but less than 15 amino acids in 
length are present in almost all proteins (Additional file 
11). The other 33 motifs were found to be specific to the 
different subclades of the GDSL esterase/lipase phylo- 
genetic tree. We found that the GDSL proteins that 
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Figure 3 The phylogenetic relationship of the OsGELP gene 
family. The unrooted tree was constructed based on multiple 
sequence alignment of the rice OsGELP protein sequences using 
ClustalW program by NJ method with 1,000 bootstrap replicates. 
Subclades are numbered at the right part of the tree and marked 
with different alternating tones of a background to make subclade 
identification easier. OsGELP genes that are in the same coloured 
boxes are segmental duplicated genes. Coloured dots indicate 
genes in tandem duplication. Vertical dashed black lines point out 
genes from genomic clusters. The node numbers lower than 50 are 
not shown. 



cluster in clade I in the phylogenetic tree share a similar 
motif pattern (motifs 14, 16, 20, and 21), whereas there 
were no specific motifs for clade III. At the same time, 
the subclades of clade III demonstrate high diversity in 
specific motifs (Figure 4). Most of the OsGELP proteins 
that clustered together with homologs and/or orthologs 
in the same subclade share more than one additional 
conserved motifs outside the GDSL domain. Motifs 13, 
19, 22, and 27 are specific to subclades la, lb, Ic, and Id, 
whereas motifs 33, 34, 38, 43, and 45 exclusively appear 
in subclade Ie (Figure 4). Subclades Ilia, Illb, and Hid, 
Hie, Illf contain specific motifs 28, 31, 39, and 15, 25, re- 
spectively (Figure 4, Additional file 11). Subclades la and 
lb exclusively contain motifs 23 and 32, respectively. 
Motif 24 is specific to subclade Illb. Subclade Hie 
appears to have distinct motifs 26 and 35. Finally, five 
particular motifs (28, 29, 37, 40, and 44) belong to sub- 
clade Illf (Figure 4). 

The newly found additional, subclade-specific motifs 
were considered as novel, because there were no any sta- 
tistically significant sequence similarities of our motifs 
with known motifs or possible function assignments 
within the Prosite and UniProtKB/Swiss-Prot databases 
[64,65]. 



Distribution of the conserved motifs and their locations 
on the three-dimensional structure 

We consider the possibility that the consensus regions 
outside of the motifs encoding GDSL esterase/lipase 
conserved blocks I, II, III, and V may contain function- 
ally important motifs involved in substrate specificity, 
protein structure ordering and arrangement, protein- 
protein interaction, etc. Such "supplemental" functional 
motifs often remain conserved among members of a 
subgroup in large families in plants [66,67]. Thus, the 
proteins within the subgroups that share these motifs 
likely display similar functions. To find the three- 
dimensional orientation of these additional motifs, in 
order to support our functional prediction, the structure 
prediction were conducted on the OsGELP proteins 
using the Protein Homology/analogY Recognition En- 
gine (PHYRE) server [68]. 
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Figure 4 An analytical view of the phylogenetic relationship among the rice OsGELP and plant homologues of known function. Protein 
NJ tree: The unrooted tree, constructed using ClustalW, summarizes the evolutionary relationship among 120 members of the GDSL esterase/ 
lipase plant family. The NJ tree was constructed using the alignment of only the highly conserved amino acid sequence regions. The tree shows 
13 major phylogenetic groups. Left column identifies subclades and is marked with different alternating tones of background to make subclade 
identification easier. The numbers beside the branches represent bootstrap values based on 1,000 replications. The node numbers lower than 50 
are not shown. Protein motif structure and location: the OsGELP and plant GDSL esterase/lipase proteins are in the order of their appearance in 
the phylogenetic tree. Each coloured box represents particular motif. Their consensus sequence, length (amino acids), number of the GDSL 
esterase/lipase proteins containing the motif, and E-value are given in Additional file 1 1. The GDSL motif blocks I, II, III, and V are indicated in pink 
boxes above the motif distribution pattern. The length of proteins (amino acids) can be estimated using the scale at the bottom. Motifs enclosed 
in red, blue, or green frames are highlighted motifs that exclusively appear in proteins from one, two, or three subclades, respectively. The 
number of highlighted motifs specific for one or several subclades is given at the right. The secondary element assignment, below the motif 

distribution scheme, corresponds to the general structure of the OsGELPs. 

k. J 



The structural homology detection showed four of the 
most closely homologous structures of the bacterial 
GDSL motif proteins. The lipase/acylhydrolase from 
Enterococcus faecalis [Protein Data Bank (PDB) code 
lyzf] showed 10%- 15% similarity, esterase from Strepto- 
myces scabies (PDB code lesc) demonstrated 10%- 14% 
similarity, and thioesterase I from E. coli (PDB code 
livn) showed 15-18% similarity. Finally, the general pre- 
diction model of the OsGELP proteins was built using the 
X-ray structure of the aryl esterase from M. smegmatis 
(PDB code 2q0q), which showed the highest similarity 
from 17% to 19% (Figure 5A). 

The predicted basic structural model consists of six a- 
helices and a central (3-sheet core containing six parallel 
(3-strands (Figure 5). The active Ser residue is located in 
the loop region (LI) right after the first (3-strand; mean- 
while, in the bacterial structural model, Ser appears in a 
short helical segment following the first (3-strand. The 
aspartic acid and His residues, which together with Ser 
form the catalytic triad, seems to hold the same location 
in plants and bacteria, and reside in the turn structure 
preceding the C-terminal a-helix (Figure 5B). Blocks II 
and III with their representative Gly and Asn residues, 
which act as proton donors to the oxyanion hole, are 
located in the unstructured regions following the second 
(3-sheet and right after the third (3-sheet, respectively, 
and designated in Figure 5B as L3 and L5. 

Moreover, many predicted putative motifs within the 
unstructured loop regions were observed to be specific 
to the members of phylogenetic clades I or III and/or 
the subclades of these clades (Figure 5B). Three loops 
(LI, L3, and L9) can be specified as the most divergent 
in terms of motifs for the different OsGELP phylogenetic 
groups that deviate in biological functions. These par- 
ticular loops possibly play a role in differentiation of 
substrate-binding specificity for the different subclades 
and thus bring their broad functional divergence. 

Discussion 

For plants, during the course of their evolution, gene fam- 
ilies generally underwent either tandem and/or large-scale 



segmental duplication to maintain a high number of fam- 
ily members [69-71]. The phylogenetic tree (Figure 3) 
demonstrates that the genes from 7 gene clusters are sister 
genes, with high degrees of phylogenetic relatedness. Only 
17 genes from gene clusters I, IV, VI, VIII, and XII probably 
emerged as a result of local duplication, as it was previously 
shown by the phylogenetic analysis by Volokita et al. [30]. 
The phylogenetic study of plant GDSL esterases/lipases 
from bryophytes, gymnosperms, monocots, and eudicots 
suggested that duplication of more than 40% of rice GDSL 
genes predated the sorghum-oryza split [30]. If this number 
is combined with the number of other instances of genes' 
duplication events, such as segmental or tandem duplica- 
tion, the high number (71%) of the OsGELP genes poten- 
tially arose from such mode of evolutionary novelty. Taken 
together, the data suggest that duplications in general 
played a major role in the multiplication of the OsGELP 
genes, in the course of evolution. These conclusions are in 
line with a previous examination of the evolutionary 
mechanisms of the GDSL esterase/lipase gene family in 
land plants [30]. The fact that many OsGELP gene clusters 
are interrupted by a number of TE-related gene insertions 
implies that duplication events of the GDSL esterase/lipase 
protein family genes were followed by insertion of the TEs 
throughout the course of their evolution. The large number 
(62%) of the OsGELP genes with TEs can be also regarded 
as supporting evidence in favour of the theory that subse- 
quent and important events for the expansion in size of the 
OsGELP gene family in the rice genome after duplication 
could be the amplification of the repetitive elements (Add- 
itional file 8). This observation is consistent with the previ- 
ous conclusion that one of the forces for amplification of 
the rice genome is the addition of TEs [72], 

Several forms of gene regulation, positive and negative, 
that involve plant introns were found [73]. Considering 
that the intron evolution in the rice genome is largely 
dominated by intron loss [74,75], the large introns within 
the OsGELP genes that were left in the course of natural 
selection are likely due to their possible functionality. Re- 
cent studies have shown that some introns can function as 
alternative promoters or enhancer elements, and some 
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Figure 5 Schematic diagrams of the structure prediction for the rice OsGELP esterase/lipase proteins. A. The stereoview of the ribbon 
diagram for general structure prediction model of the OsGELP proteins is given. The six-stranded (3-sheet is labelled. The catalytic triad Ser, Asp, 
and His are shown as sticks. B. Common schematic view of the OsGELP protein secondary structure. The folds showing six parallel (3-strands are 
labelled (31 -(36 and helices are labelled a1-a6. The loop regions are labelled LI -LI 0. The location of the GDSL consensus blocks is coloured 
magenta and catalytic residues are shown. Highly variable motif composition loops (LI, L3, and L9) are pointed out. The phylogenetic subclade in 
Figure 4, which contains specific motif(s) within the mentioned loops, is enclosed in shaded coloured boxes next to the motif numbers. 



introns promote mRNA accumulation through diverse 
processes called intron-mediated enhancement [73]. In 
addition, in contrast to exon evolution, introns appear to 
be under a lower selection pressure; thus, they could fre- 
quently vary in size and sequence, and slowly diverge if 
their position in the genes that facilitate the evolution of 
new proteins through exon shuffling and alternative spli- 
cing increased the coding capacity of a genome [73,76,77] . 
Although the OsGELP genes with long introns contain re- 
petitive elements, the majority of them (47 of 52) are 
expressed. For example, aforementioned OsGELP21 and 
OsGELP97 genes are expressed in various rice organs in 
three and two alternative splicing forms as supported by 
cDNA evidence (Figure 1, Additional file 2). Stress condi- 
tions are one of the effectors of the alternative splicing of 
pre-mRNAs because stress regulation might enable plants 
to quickly regulate the splicing and gene expression of 
many unrelated genes [61]. Many alternatively spliced 
transcripts that were expressed under stress conditions 
were found among long intron genes (Figure 1). For in- 
stance, the OsGELP21 gene that encodes three alternative 
spliced forms in the first and third forms is expressed in 
the shoots and calluses under the etiolation and heat treat- 
ments. These results suggest that subsequent studies 
should continue to investigate the advanced functions and 
transcriptome complexity of the OsGELP gene family. 

In accordance with the phylogenetic analysis, 24 plant 
GDSL esterases/lipases genes, whose functions were elu- 
cidated recently, fell into two putative groups that differ 
in their generic biological processes: clades I and III. In 
general, according to the experimental findings [33-57], 
the OsGELP gene orthologs and paralogs of known func- 
tions from clade I can be potentially involved in the sec- 
ondary metabolism pathways, plant development and 
morphogenesis, whereas the orthologs from clade III 
seem to play a role in plant defence and reproduction 
(Additional file 10). Furthermore, to show possible func- 
tion divergence of GELP genes in rice, the microarray 
expression data of clade I and clade III were searched in 
terms of their responses to different treatment condi- 
tions by querying the Genevestigator microarray data- 
base [58]. With the 2-fold expression difference cutoff, 
the expression profiles of 50 OsGELP genes that share 
28 to 80% similarity, to the 24 GDSL esterases/lipases 
genes of known functions are summarized in Figure 6 
(Additional file 10). As shown in Figure 6, such factors as 



nutrient deficiency, chemical and hormonal treatments, 
biotic and abiotic stresses can modulate the expression of 
these 50 genes. The most notable expressional difference 
between clade I and III seems to be in response to the 
cytokinins [£ra#s-zeatin (tZ), 6-benzylaminop urine (6- 
BAP), or kinetin (KT)] treatment (Figure 6). Cytokinins 
are a class of plant hormones associated with regulations 
of plant growth and development, chloroplast biogenesis, 
bud and root differentiation, shoot meristem initiation 
and growth, stress tolerance, and organ senescence [78]. 
Expression profiles of genes from clade III do not show 
significant change in their expression fold in the presence 
of the cytokinin. At the same time, many members of clade 
I show differential expression under KT, tZ, BAP hormones 
treatment (Figure 6), implying the possible role of the genes 
from clade I in plant growth and development. 

To validate the results of the microarray data obtained 
from Genevestigator database, changes in the expression 
level of 17 representative OsGELP mRNAs from clades I 
and III, under cytokinins (tZ, KT, or BAP) treatments in 
rice seedling were examined by quantitative real-time 
RT-PCR. The treatment conditions were repeated 
according to the description of the experiments in the 
Genevestigator database. The expression patterns that 
were obtained via RT-qPCR for 8 and 2 selected genes 
which were treated with tZ and BAP, respectively, fol- 
lowed the same tendency and confirmed the microarray 
data (Additional file 12). Results of the digital expression 
analysis for the OsGELP2, 17, 12, 61, 44, 77, 90, 100, and 
OsGELP92 genes were not coherent with the qPCR ana- 
lysis, and did not show up -regulation during the tZ and 
KT treatments, respectively. Although, most of the genes 
from both clades showed up-regulation of their expres- 
sion, only 3 genes (OsGELP 15, 50, 88) from clade I were 
significantly up-regulated (> 2-fold) after treatment with 
tZ or BAP hormones for 30 min or 3 h, respectively. At 
the same time, none of the genes from clade III demon- 
strated significant fold change under the cytokinins 
treatment (Additional file 12), therefore suggesting func- 
tional differentiation of the two examined clades. Further 
experiments are needed to confirm the microarray valid- 
ity in order to explore functional divergence of the 
OsGELP family. 

Following the assumption that functional information of 
unknown GDSL esterases/lipases can be deduced from 
the orthologs of known functions [30], we attempted to 
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Figure 6 Expression pattern of the OsGELP genes with predicted functions in response to different treatment conditions. The microarray 
data-based expression profiles under various conditions are presented using the meta-profile analysis tool at Genevestigator for 50 OsGELP genes. 
The transcript levels are depicted by numbers indicating relative fold values. The OsGELP genes are in the order of their appearance in the 
phylogenetic tree. The number of clades and subclades are presented in the left side of the diagram. The subclades are highlighted in the same 
alternating tones as they were shadowed in the phylogenetic tree in Figure 4. 



extrapolate the functional characteristics of described 
plant GDSL onto the OsGELP rice genes. Using the func- 
tional descriptions of the potential orthologs and homo- 
logs, based on the phylogenetic grouping, the possible 
putative functions for a number of the OsGELP genes 
were predicted, and discussed further. 

The rice GDSL esterase/lipase family members 
OsGELP4, 5, and 12 from subclade la share high similar- 
ity with AmGDSHl (Alopecurus myosuroides hydrolase) 
that demonstrates acetylajamaline hydrolase activity and 
it is involved in alkaloid metabolism [47]. Subclade lb 
(OsGELP2, 3, 63, 77, 84, and 85) is expected to be 
involved in plant development and morphogenesis at the 
seedling stage according to the function of their close 



homologue GER1 (OsGELP33) [56]. These genes are not 
only expressed in many rice organs and development 
stages, as well as share analogous gene structure and 
special protein motif 32, but also change their expres- 
sion dramatically under stress conditions during early 
plant development (Figures. 1, 4, and 6). Two genes 
from clade lb have received attention in recent litera- 
ture. The OsGELP63 gene is induced by both red and 
far-red light and by jasmonic acid, and acts in response 
to drought and cold stresses [79]. The study of 
OsGELP33 (GER1) has demonstrated the role of this 
gene in the rice plant development at the seedling and 
coleoptile elongation stages [56]. OsGELP33, together 
with its sister genes OsGELP84 and OsGELP3, arose 
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from the segmental duplication event (Figure 2). Their 
close phylogenetic relationship is confirmed by the high 
node number and the high protein similarity score (Figure 4, 
Additional file 10). Therefore, the functions of these genes 
might be similar to that of the OsGELP33 (GER1). Subclade 
Ie, mentioned previously, is a good example of the group of 
genes with possibly related functions. The genes in subclade 
Ie appear to belong to the cell wall-associated proteins with 
carbohydrate substrate specificity (Figure 4, Additional file 
10). Together with the cell wall-associated GDSL esterase/ 
lipase orthologs (e.g., AtFXGl, LAE, Enod8, maize ACHE, 
and HevblS) [46,50,51,53,54,80], rice OsGELP14, IS, 16, 60, 
61 , 66, 80, 91, and 92 genes form a distinctive group in 
clade I (Figure 4). The a-fucosidase 1 (AtFXGl) from A. 
thaliana, lanatoside 15'-0-acetylesterase (LAE) from Digi- 
talis lanata Ehrh. Woolly, and their homologue Early 
nodulin protein (Enod8) from Medicago sativa are report- 
edly active on oligo- or polysaccharide substrates [46,50,51]. 
LAE acts as deacetylators on cardenolide glycosides (carde- 
nolides that contain structural groups derived from sugars) 
[51]. AtFXGl modifies xyloglucan oligosaccharides through 
the hydrolysis of t-fucosyl residues [50]. The representatives 
of the acetylcholinesterase (AChE) gene family have been 
characterized and cloned recently in several plants, includ- 
ing Z mays L., Macroptilium atropurpureum, and Salicor- 
nia europaea L. [52-55]. Although the definite physiologic 
role of the AChE gene family has not been elucidated yet, 
AChEs are suggested to play a role in the gravity response 
of plants. According to the motif analysis, this group of the 
cell wall-associated proteins shares several special motifs in 
the subclade Ie, such as motifs 38, 33, 34, 43, and 45 (Fig- 
ure 4). A total of 29 rice OsGELP genes from clades I and 
III putatively can be important to the plant defence re- 
sponse against biotic infections, as evident from their 
microarray expression data (Figure 6) and relatively high 
similarity to the number of defensive GDSL esterases/ 
lipases (e.g., CaGLIPl, CaGLl, AtLTLl, GLIP1, GLIP2, Br- 
SIL1, ESM1, andMVPi) (Additional file 10) [33-41]. As po- 
tentially appealing subjects for future analyses of the 
OsGELP gene family, subclade Illf can be specifically stud- 
ied. It not only contains five different exclusive motifs (28, 
29, 37, 40, and 44), but also some of its members show ex- 
pression in all rice organs and share similar gene structure 
patterns within a particular subclade (Figures. 1 and 4, 
Additional file 6). 

Based on the protein sequence analysis, a diversity of 
the consensus regions outside of motifs that encoded 
the GDSL esterase/lipase conserved blocks I, II, III, and 
V was found. These consensus motifs are specific to dif- 
ferent phylogenetic clades and/or subclades from con- 
joint tree that differed in biological functions (Figure 4). 
The GDSL esterases/lipases are active on a wide range 
of substrates. This multienzymatic activity can be 
explained by the flexible substrate-binding pocket in the 



active site, which facilitates the binding of different sub- 
strates [3]. Considering that many motifs can be func- 
tionally important and play a role in enzyme specificity 
and biochemical activity, the long loop regions extending 
from the protein core in the plant GDSL esterases/ 
lipases might be involved in the diversification of mo- 
lecular multifunctionality, as this was found in bacterial 
species [25,27]. For example, aryl esterase from M. smeg- 
matis and thioesterase I from E. coli share a common 
structural fold, but differs in the additional insertions— 
unstructured loop regions in the aryl esterase proteins. 
It was suggested that such insertions might determine 
the type of enzymatic mechanism, contribute to the 
oligomerization, and greatly restrict the shape of the en- 
zyme active site [27]. Many of the predicted motifs 
within the loop regions were found to be specific to the 
members of particular phylogenetic subclades that unite 
the GDSL enzymes with similar biological functions 
(Figures. 4 and 5). Based on these findings, we would 
like to specifically highlight LI, L3, and L9. The peptide 
regions of these loops are specific to different subclades 
from clades I and III. Based on the functional prediction, 
these subclades represent the proteins with different mo- 
lecular functions and reaction types. As shown in the 3D 
protein structure prediction model (Figure 5A), loops 
LI, L3, and L9 are hypothetically oriented around the 
enzyme active site and function in the flexibility of the 
substrate-binding pocket. Therefore, these loops should 
be studied further to determine their role in molecular 
functional diversification of the plant GDSL enzymes. 
Experiments using reverse genetics would be required to 
establish contribution of these motifs. The close homo- 
logs or orthologs from plant species with known puta- 
tive functions, which cluster together with the OsGELP 
proteins in the same subclades, share one or more add- 
itional conserved motifs (Figure 4). Although the func- 
tions of these specific motifs outside the GDSL esterase/ 
lipase domain are still unknown, the presence of the 
conserved motifs certainly reflects the functional similar- 
ities among the OsGELP proteins that share these com- 
mon motifs with other plant homologue proteins of 
known function. 

The rice GDSL esterase/lipase family is notably one of 
the 11 largest families in the rice genome, with more 
than 100 members [81]. In other fully sequenced plant 
genomes, the GDSL esterase/lipase family also consist of 
high number of family members [29,30]. The remarkably 
high number of genes in the GDSL family in different 
plants can be explained by differences in enzyme func- 
tion and activity on a wide range of substrates, as it was 
shown by Volokita et al. [30] . This claim is supported by 
the existing data collected by investigations of the GDSL 
esterases/lipases, which have already undergone func- 
tional analysis, cloned, and characterized in different 
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plant species, and whose physiologic role, properties, 
and functions have been elucidated (Additional file 10). 
The multifunctionality of the OsGELP family in rice, as 
well as in other land plant species, their diverse roles in 
different aspects of plant growth and development can 
be explained by the complexity and diversity of the 
genes at the structural level The large number of genes 
that comprise the GDSL esterase/lipase family in land 
plant species, with many distinct groups and subgroups 
arising in the course of evolution, further explains func- 
tional divergence. Hypothetically, plant GDSL esterase/ 
lipase proteins are the evolutionary product of recom- 
bination of several proteins, and contain various 
domains/motifs with putative functions. Such an as- 
sumption provides a clue to further study the diverse 
functionality of this enzyme family. Motif search ana- 
lysis, presented here, offers further evidence for such 
supposition. Our manuscript introduces, for the first 
time, a concrete rationale for further experimentation 
with the rice OsGELP family members, and presents 
unique opportunities, and articulates coherent basis for 
functional studies. Further analyses of the gene functions 
using RNAi and overexpression are currently under way 
to elucidate the mechanisms further. 

Conclusions 

The present bioinformatics analysis accommodates new 
insights into the genomic and proteomic diversity of the 
rice GDSL esterase/lipase gene family. The phylogenetic 
analysis divides the OsGELP gene family into the distinct 
groups that share similar protein motif structure. We 
found 41 additional motifs that represent the core second- 
ary structure elements or appear specifically in different 
phylogenetic subclades. Members within the specified 
subclades can have common evolutionary origins, and ob- 
tain common unambiguous motifs that probably reflect 
their related molecular functions. Thus, our study support 
required basis, and should stimulate future full-fledged 
functional studies of these particular motifs, as under- 
standing the structure-function relationship of the mem- 
bers of the OsGELP gene family is necessary. 

Recently, only few rice OsGELP genes have been studied 
in order to determine their function. Here, we provide a 
rationally reasoned, well defined platform for more 
detailed functional, in-depth studies of the OsGELP genes 
based on combination of the phylogenetic, motif, and pro- 
tein dimensional structure analyses. The findings pre- 
sented in our manuscript can be utilized for selection of 
candidate genes for functional validation studies. It is of 
broad interest to the biological research community with 
wide and important practical applications in biotechnol- 
ogy and food science. The researchers from different 
domains, with different goals will find our analyses crucial 
for the initiation of their investigations. 



Methods 

Identification of genes coding the GDSL esterase/lipase in 
genome sequences of rice subsp. japonica cv. Nipponbare 

A total of 132 genes were identified as possible candi- 
dates of the GDSL esterase/lipase proteins using primary 
bioinformatics analysis. First, the genes previously anno- 
tated as GDSL esterase/lipase were collected from sev- 
eral public online databases, such as MSU RGAP 
(release 6.1), Rice Protein Database in GRAMENE, and 
GenBank from the National Centre for Biotechnology 
Information [31,32,82]. Then, multiple BLAST algorithm 
analysis of the primary candidates, using the typical 
GDSL esterase/lipase protein sequence as our query, was 
done. The OsGELP candidates were tested against the 
Hidden Markov Model (HMM) profile (build 2.3.2) of 
GDSL domain, numbered PF00657 in the Pfam HMM 
library in the MyHits protein domains database [83]. All 
sequences with an E-value below 0.1, gathering cut-off 
above -69.0, and length above 100 amino acids were 
selected for further analyses. Subsequently, five genes 
that possessed repetitive sequences and were defined 
as retrotransposon genes, such as LOC_Os01gl2340, 
LOC_Os01g32630, LOC_ Os06g24420, LOC_Osl0g09130, 
and LOC_Osllgl9690, were excluded from our analyses. 
We also eliminated several putative OsGELP genes that 
were annotated as esterase, anther-specific proline-rich pro- 
tein APG precursor, alpha-L-fucosidase 3 precursor, hypo- 
thetical protein, expressed protein, and carboxylic ester 
hydrolase, and had GDSL motif with Pham E-value less 
than 0.1 (Additional file 13). 

The nomenclature of the OsGELP genes is based on 
the arrangement of positions on rice chromosomes 1- 
12. In the present study, the LOC prefix from all RGAP 
locus IDs that represent the GDSL esterase/lipase genes 
were removed for convenience. Information regarding 
ORF length, amino acid number, molecular weight, and 
isoelectric point of protein was downloaded from RGAP. 
The full-length cDNAs of all predicted genes were 
searched in the KOME [84]. Genomic sequences that 
were misannotated compared with available FL-cDNA 
sequences were corrected manually for the following 
analysis. 

Distribution of the OsGELP genes on rice chromosomes 
and duplication events 

The chromosomal distribution of the predicted OsGELP 
genes members was retrieved from the RGAP database. 
Information regarding their physical positions was 
derived from the RGAP database according to the loca- 
tion of the rice chromosome pseudomolecules [32]. To 
identify the closely linked OsGELP genes, defined as 
gene cluster sets, in the rice chromosomes, the RGAP 
Rice Genome Browser was explored. Segmental duplica- 
tion analysis was done with the RGAP rice segmental 
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duplication database with the maximum length distance 
permitted between collinear gene pairs set to be 500 kb. 
The information on tandemly duplicated OsGELP genes, 
paralogs, and orthologs was obtained from the Rice Pro- 
tein Database in Gramene [31], the Kyoto Encyclopedia 
of Genes and Genomes (KEGG) Database [85], the 
GreenPhyl Orthologs Search Tool (GOST) [86], and the 
Orthologous Groups Search page on RGAP. Outparalogs 
were determined from phylogenetic analyses of GDSL 
esterase-lipases from 7 plant species by Volokita et al 
[30]. Proteins designated as homologous to 24 plant 
GDSL esterase/lipase genes, whose putative functions 
were annotated recently, share 30% -80% similarity. 

Exon/intron structure and sequence analysis 

The exon/intron structures of the OsGELP genes were 
retrieved from the RGAP [32] and Gramene/Ensembl Gen- 
ome Annotation for Rice [31]. For genes whose cDNA 
sequences were available, their structure was checked 
manually, aligning genomic and cDNA sequences. The dia- 
gram of the exon/intron structures and information on in- 
tron distribution pattern were obtained using the online 
Gene Structure Display Server [87]. The alternative splicing 
of the OsGELP genes was validated manually by alignment 
of rice FL-cDNA with genomic sequences or using RGAP 
Rice Genome Browser. The repetitive sequences were 
screened using RepeatMasker database [88]. 

Multiple sequence alignment, and phylogenetic analysis 

The OsGELP genes nucleotide cDNA and CDS sequences 
were translated into protein sequences. The protein 
sequences were aligned using multiple sequence align- 
ment via the ClustalW method and were then manually 
corrected and implemented in the MEGA4 software (ver- 
sion 4.0) [89]. A total of 18 OsGELP genes were excluded 
from the final alignment because of the absence of some 
conserved GDSL blocks and poorly matched alignable 
regions with gaps. The culled protein set consisting of 96 
OsGELP genes was used to construct trees. Second 
unrooted NJ phylogenetic tree combined 96 OsGELP 
genes and 24 plant GDSL orthologs or homologs whose 
putative functions were annotated recently following by 
procedure described by Volokita et al. [30]. 

A multiple-step strategy was used to construct the 
phylogenetic trees. Very large protein families commonly 
contain various domains and repeats that make them ex- 
tremely difficult to analyze. The special feature of the 
GDSL esterases/lipases is the presence of the four strictly 
conserved residues Ser-Gly-Asn-His in conserved blocks 
I, II, III, and V. Consequently, our first consideration was 
to construct the phylogenetic tree based on the four 
blocks of the GDSL enzyme. Surprisingly, the node num- 
bers were very low, and any kind of phylogenetic tree ana- 
lysis would not help. The multiple alignments showed 



diversity of the strictly conserved areas that were consist- 
ent throughout the protein sequences of all GDSL candi- 
dates, along with the less conserved regions with gaps. To 
analyse those well-conserved regions, a motif identifica- 
tion search was conducted together with the protein 
structural prediction analysis. First, using Multiple Em for 
Motif Elicitation (MEME) program, the additional puta- 
tive conserved motifs from a total of 120 plant GDSL es- 
terase/lipase proteins (96 rice OsGELP proteins and 24 
plant GDSL esterases/lipases whose putative functions 
were elucidated recently) were identified [90]. Second, 
after the structural topology of the OsGELP was predicted, 
the multiple sequence alignment, motif search, and pro- 
tein structure analysis were analytically combined. Thir- 
teen aligned regions (including GDSL esterase/lipase 
blocks I, II, III, and V) were found to be consistent 
throughout all 120 proteins and, in most cases, they en- 
code the core secondary structure elements such as a- 
helices and/or (3-sheets. Assuming that these core struc- 
ture regions are mainly ancient, less mutated, and, prob- 
ably, in the course of evolution, were under the lowest 
selections pressure, the phylogenetic study was performed 
based on these well-conserved regions. As a result, the 
trees were based on 13 conserved alignment blocks, which 
are represented by 23 putative conserved motifs (motifs 
1-7, 10-12, 17, 20, 22, 24, 27, 30, 36-38, 40, 42, and 44) 
that were identified through motif search analysis (Add- 
itional file 14). The phylogenetic trees that were built 
based on that strict alignment blocks showed the highest 
node numbers compared with the other trees that were 
based on full-length or four GDSL block alignments. In 
parts of the sequences that were out of those well- 
conserved alignment regions, including the N- and C- 
terminus, rich gap parts were manually removed from the 
alignment and phylogenetic analysis of all 120 GDSL pro- 
tein sequences. Finally, two unrooted phylogenetic trees 
were constructed using the NJ method and were displayed 
using the MEGA4 program. The bootstrap values of 1,000 
replicates were placed at the nodes, and the scale bar cor- 
responded to 0.1 estimated nucleic acid substitutions per 
site. The topologies of the eventual unrooted NJ trees 
were maintained in trees that were built using the distance 
or parsimony methods. 

Determination of conserved motifs, and structure 
modelling 

To identify the additional putative conserved motifs in 
the rice OsGELP gene family and in 24 plant GDSL 
esterases/lipases, whose putative functions were recently 
elucidated, the MEME motif search tool was used [91]. 
During our motif distribution search, different sets of 
parameters for width, number, and occurrences were 
tried for a single motif. Our final motif search was based 
on the following criteria: number of repetitions, zero or 
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one per sequence; maximum number of motifs, 45; 
optimum motif width, >6 and <15. The N and C- termini 
were removed from all protein sequences in the final 
motif search after we confirmed no additional motifs 
were present in those parts. To determine which of our 
motifs can be considered novel, all regular expressions 
of found motifs were compared against the Prosite data- 
base patterns [64]. Functional annotation search was 
completed with UniProtKB/ Swiss- Pro t and Prosite data- 
bases [64,65]. 

To gather information about the secondary and ter- 
tiary structure of the OsGELP proteins, 3D models were 
constructed using the automatic protein structure hom- 
ology modelling server using the PHYRE software [68]. 
Each submitted OsGELP sequence was scanned against 
the non-redundant sequence database structural classifi- 
cation of proteins and the PDB database. Aligned struc- 
tures were displayed and analyzed within the PyMOL 
Molecular Graphics System [92]. Topology map was cre- 
ated using the TopDraw program [93] . 

Expression analysis of the OsGELP genes 

The evidence of expression of the rice OsGELP genes 
was obtained by several types of transcript data, such as 
FL-cDNA, EST, and/or MPSS from Expression Evidence 
Search page at RGAP [32], and the microarray data were 
available at the Genevestigator site [58]. The locus name 
of the GDSL esterase/lipase genes was used to query the 
MPSS database containing the signature information of 
the genes [94]. 

Hormone treatment and quantitative real-time RT-PCR 
analysis 

To confirm the differential expression of representative 
OsGELP genes under the hormone treatment identified 
by microarray data analysis, the tissue samples of seed- 
ling, from the rice (O. sativa L. cv Tainung 67, a japonica 
variety) were collected. The seeds that were sterilized 
with 70% ethanol for 15 min and then with 2% (w/v) so- 
dium hypochlorite for 15 min, soaked in distilled water 
at 30°C for 1 day, and germinated seeds were grown for 
7 days or 2 weeks with a photoperiod of 12 h light (30° 
C)/12 h dark (28°C). For hormone treatment with tZ, 
the whole roots were cut at the lamina joint in water 
from the 2-week-old seedlings and immediately dipped 
in distilled water containing either 5 uM £raws-zeatin in 
dimethylsulf oxide [DMSO; 0.1% (v/v)] or an equal vol- 
ume of DMSO as a control. Each excised organ was 
incubated at 30°C for 30 min, as it was described previ- 
ously [95]. For kinetin responsive study, rice seeds were 
germinated and grown hydroponically in nutrient solu- 
tion [96]. Seedling samples grown till the 3-leaves stage 
(two-week-old seedlings) and then treated with 100 uM 
kinetin for 60 min. For cytokinin treatment with benzyl 



aminopurine (BAP), rice seedlings that were grown 
hydroponically for 7 days, were transferred to a solution 
containing 50 uM benzyl aminopurine for 3 h. Seedlings 
mock-treated with dimethylsulfoxide (final concentration 
0.1%) served as the control. All samples are harvested 
and stored at -80°C until the RNA was extracted. 

Real-time PCR analysis was performed using gene- 
specific primers as described earlier [97]. The primer 
sequences are listed in Additional file 15. There are at 
least three biological replicates of each treatment and 
duplicate QRT-PCR analyses for each sample. Total 
RNA was prepared using RNeasy plant Mini Kit (Qia- 
gen) with RNase-free DNase I (Qiagen). Approximately 
2 ug of total RNA was used as template for first-strand 
cDNA synthesis, which was performed by Superscript 
III RT (Invitrogen, Carlsbad, CA, USA) with oligo(dT)15 
primers in a reaction volume of 20 ul. The RT reaction 
was diluted 1:10 and 5 ul used in the amplification with 
the specific PCR primers. Quantitative RT-PCR analysis 
was performed using an ABI 7500 real-time detection 
system and SYBER Green Dye (ABI, Foster City, CA). 
PCR amplification was performed in duplicate. The 
RNA expressions were normalized with the internal con- 
trol, ACTIN 1 (ACT1) or 18 s rRNA [97] to ensure the 
equal amount of cDNA. The mRNA levels for each can- 
didate gene in different tissue samples were calculated 
using the AACT method. 

Additional files 



Additional file 1: Characteristics of the rice GDSL esterase/lipase gene 
family. The gene name, locus ID MSU Osa1 RGAP Release 6.1, open 
reading frame length, protein length, FL-cDNA, genomic sequences and 
CDS accession numbers, and isoelectric points of all 114 OsGELP genes 
are given. 

Additional file 2: Expression evidence for the OsGELP rice genes. The 
OsGELP gene names, locus ID, MPSS signature sequences, FL-cDNA 
number, total quantity of mapped ESTs, and the presence of microarray 
data from Genevestigator for each of 153 transcripts (including 
alternative spliced models) of the 114 OsGELP genes are given. 

Additional file 3: Pattern of the OsGELP gene clusters on rice 
chromosomes. (A) The order and clusters' structures of 54 OsGELP genes 
on rice chromosomes. (B) The pattern of the OsGELP gene clusters on 
rice chromosomes, which are interrupted by unrelated genes. 

Additional file 4: The OsGELP genes present on duplicated 
chromosomal segments of rice 0. sativa L. ssp. japonica. The segmental 
duplicated of the OsGELP genes, with their BLASTP E-value, locus ID, and 
chromosome coordinates, are present according to the RGAP Segmental 
Genome Duplication of Rice, with the maximal length distance permitted 
between collinear gene pairs of 500 kb. 

Additional file 5: The OsGELP genes resulting from duplications after 
the eudicots-monocots split, and preceding the sorghum and rice 
speciation. Such OsGELP genes with their gene names and chromosome 
locations are presented. 

Additional file 6: Gene structure of the OsGELP genes. The exon/intron 
structures of a total of 153 transcripts (including alternative spliced 
models) of the 1 14 OsGELP genes are presented. Green and blue boxes 
represent exon and UTR regions, respectively, and solid lines indicate 
intron regions. 
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The length of the boxes and lines are scaled based on the length of 
genes. 

Additional file 7: Chromosomal location and exon/intron number for 
the OsGELP rice genes. The OsGELP gene names, locus ID, chromosomal 
location, open reading frame and genomic sequence length, and 
numbers of exons/introns for each 1 14 GDSL esterase/lipase genes are 
given. 

Additional file 8: Identification of the repetitive DNA sequences within 
the OsGELP rice gene family. Diverse types of repetitive sequences with 
names, length (bp), and their positions and numbers for the 71 OsGELP 
genes are shown. The list of the repetitive DNA sequences present in the 
OsGELP genes is displayed in the order of their appearance from 5'- to 3- 
end. 

Additional file 9: The 18 OsGELP proteins that were excluded from 
phylogenetic analysis. The GDSL esterase/lipase gene names, protein 
length, and the presence of five strictly conserved residues Ser-Gly-Asn 
-Asp-His in conserved blocks I, II, III, and V for 18 excluded genes are 
given. The presence of the consensus GDSL blocks is indicated by filled 
coloured boxes, and blank boxes display the absence of consensus 
alignment between them and other OsGELP proteins. 

Additional file 10: Physiological role, properties, and putative functions 
of plant GDSL esterases/lipases . The name, accession number, properties, 
and putative functions, as well as general biological roles of 24 plant 
GDSL esterases/lipases, whose putative functions have been elucidated 
recently and were adjoined into the original rice OsGELP family NJ tree, 
are listed. The coloured table divides 24 plant GDSL esterase/lipase 
proteins into three parts according to their major biological roles: 
secondary metabolism, plant development and morphogenesis, and 
defence and are shaded in blue, green, and light pink, respectively. In 
total, 50 OsGELP proteins with their names and percentage of similarity 
to every plant homolog or ortholog protein, whose function was 
revealed recently, along with phylogenetic subclade specificity to the 
tree from Figure 4, are given. 

Additional file 11: Putative conserved motifs predicted in the OsGELP 
and known plant GDSL esterase/lipase proteins. The consensus sequence, 
regular expression, amino acid length, number of the OsGELP proteins 
containing the motif, and E-value of each 45 predicted motifs are given. 
The overall height of each column in the motif LOGO indicates sequence 
conservation at that position, whereas the height of symbols within each 
column presents relative frequency of the corresponding amino acid. 
GDSL lipase consensus block distribution is as follows: block I is located 
in motif 3, block II in motif 5, block III in motif 6, and block V in motif 2. 
Four strictly conserved catalytic residues Ser-Gly-Asn-HisxxAsp from 
conserved blocks I, II, III, and V are coloured red in regular expression of 
corresponding motifs. Regular expression pattern sequences that are 
coloured in blue and green represent possible sequences for secondary 
structure elements like helix or sheet, respectively. 

Additional file 12: Differential expression of rice OsGELP genes in 
response to plant hormone cytokinin. A. Comparison of the fold 
expression difference for the 17 representative genes under cytokinin (tZ, 
BAP, and KT) treatment for results from the real-time PCR, and the 
microarray data obtained from Genevestigator database are given. B. 
Real-time PCR analysis of representative OsGELP genes and their 
differential expression during cytokinin (tZ, BAP, and KT) treatment are 
shown. The mRNA levels for each gene in different tissue samples were 
calculated relative to its expression in control seedlings. The error bars 
represent standard deviation. 

Additional file 13: The rice GDSL esterase/lipase genes excluded from 
the general list of the OsGELP candidates. The locus ID, ORF length, 
predicted protein length, the presence of GDSL-lipase domain with 
confidence (E-value), description, and cDNA support of all 19 excluded 
genes are given. 

Additional file 14: Motifs represent 13 highly conserved OsGELP 
protein alignment blocks used for phylogenetic analysis. The consensus 
sequence, regular expression, length (amino acids), number of the 
OsGELP proteins containing the motif, and E-value of each of predicted 
motifs are given. The overall height of each column in the motif LOGO 
indicates sequence conservation at that position, whereas the height of 



symbols within each column presents relative frequency of the 
corresponding amino acid. GDSL lipase consensus block distribution is as 
follows: motif 3 is located in block I, motif 5 is in block II, motif 6 is in 
block III, and motif 2 is in block V. Four strictly conserved catalytic 
residues Ser-Gly-Asn-HisxxAsp from conserved blocks I, II, III, and V are 
coloured red in the regular expression of representative motif. Regular 
expression pattern sequences that are coloured in blue and green 
represent possible sequences for secondary structure elements like helix 
or sheet, respectively. 

Additional file 15: Primer sequences used for real-time PCR analysis. 
The OsGELP gene names and sequences of PCR primers used in the 
quantitative RT PCRs to verify gene expression levels are listed. 
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